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Abstract 



Forthcoming high-resolution observations of the Cosmic Microwave Background 
(CMB) radiation will generate datasets many orders of magnitude larger than have 
been obtained to date. The size and complexity of such datasets presents a very serious 
| challenge to analysing them with existing or anticipated computers. Here we present an 

investigation of the currently favored algorithm for obtaining the power spectrum from 
a sky-temperature map — the quadratic estimator. We show that, whilst improving 
on direct evaluation of the likelihood function, current implementations still inherently 



scale as the equivalent of O(Np) in the number of pixels or worse, and demonstrate the 



critical importance of choosing the right implementation for a particular dataset. 
O 1 PACS numbers: 98.80, 98.70.V, 02.70 



1 Introduction 



Over the next ten years a number of ground-based, balloon-borne and satellite observa- 
tions of the Cosmic Microwave Background (CMB) are planned with sufficient resolution 
to determine the CMB power spectrum up to multipoles I ~ 1000 or more (for a general 
review of forthcoming observations see jl]]). According to current theory this will provide 
us with the locations, amplitudes, and shapes of the Doppler peaks, and hence the values of 
the fundamental cosmological parameters to unprecedented accuracy. The CMB will then 
have lived up to its promise of being the most powerful discriminant between cosmological 
models 

In preparation for these datasets considerable effort is being put into developing ways of 
extracting the information they contain. Typically the raw data is cleaned and converted 
into a time-ordered dataset. This is then turned into a sky temperature map, and the map 
analysed to find its power spectrum. Having obtained the power spectrum of the dataset 
we can compare it with the predictions of any class of cosmological models to determine 
the most likely values of the parameters associated with that class. Whilst it would also 
be possible to estimate such cosmological parameters directly from the data, this would 
require the assumption of a class of models during the data analysis. We therefore choose 
to provide the more generic result of the power spectrum. 

Here we consider the analysis of an N p pixel map from a simple pointing experiment 
for multipoles 1 < I < Ni in bins 1 < b < N b — ie. we determine the location of and the 
curvature about the peak of the maximum likelihood function of the binned power spectrum 
coefficients C&. 



2 Maximum Likelihood Analysis 

Any observation of the CMB contains both signal and noise 

Aj = Si + Hi (1) 

at each pixel. For independent, zero-mean, signal and noise the covariance matrix of the 
data 

M^(AA T ) = (ss T ) + (nn T ) (2) 



is a symmetric, positive definite and dense. Given any binned power spectrum Cf, and a 
shape parameter Cf within each bin such that 

Q = CfC b leb (3) 

we can construct the signal covariance matrix; for a simple pointing experiment this is 
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where Bi is the multipole beam map and 6„i is the angular separation of pixels i, i' . Taking 
the CMB fluctuations to be Gaussian is not only consistent with the favoured inflationary 
cosmologies but also has the maximum entropy if we make no assumption about the higher 
moments of the data predicted to be non-zero in defect-based models. The probability of 
the observed dataset given the assumed power spectrum is then 



C(C) = P(A\C) 
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Assuming a uniform prior, so that P{C | A) oc P(A \ C), the most likely power spectrum 
will be that which maximizes £(C), with covariance matrix 
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3 Direct Evaluation 

Historically the most likely power spectrum has usually been obtained by evaluating C{C) 
directly over the bin parameter space to locate its maximum (for example for the COBE 
data H § , with the additional refinement of using a complete set of cut-sky basis functions 
in place of the incomplete spherical harmonics). To date the fastest general solution uses a 
Cholesky decomposition of the matrix M, costing 0(N 2 ) in size and O(Np) in time for a 
single point in parameter space. 

Algorithms for searching the AVdimensional parameter space — such as maximum 
gradient ascent — typically require O(Nb) evaluations at each of many steps. Moreover, 
calculating the covariance matrix at the maximum by discrete differencing requires a further 
0(N 2 ) evaluations. Overall therefore current implementations of this algorithm scale as 
O(Np) in size and 0{N 2 Np) in time, and become hopelessly intractable for any of the 
anticipated datasets. 

Although there have been some attempts to improve on this scaling — for example by 
transforming to the signal-to- noise eigenbasis 0] , using approximations for the determinant 
[]|], or assuming azimuthally symmetric noise ||] — none has provided a fast way to search 
a high dimensional multipole bin parameter space under an arbitrarily complex dataset. 



4 Quadratic Estimators 



Since we are interested both in a rapid search for the maximum of C, and in evaluating the 
curvature matrix of £ at this maximum, we solve 
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iteratively by the Newton-Raphson method. Starting from some (sufficiently good) target 
power spectrum C the correction 
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gives rapid convergence to the maximum of C. 

Taking the log and repeatedly differentiating equation (JsT) 
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Now if instead of the computationally intensive full curvature matrix we settle for its much 
simpler ensemble average (ie. the Fisher information matrix) we have 
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and equation (0) reduces to 
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Note that this procedure both locates the maximum and generates the (albeit approximated) 
covariance matrix i* 1 . 

The most computationally expensive calculation here is still the evaluation of the Fisher 
matrix, for which two methods have been proposed. Noting that, from equation (|j), the 
derivative matrix for each bin 
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is independent of iterative step, Bond, Jaffe and Knox pi] calculate them explicitly and solve 
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column by column for each bin. The first two rows of table 1 shows the cost of evaluating 
the Fisher matrix this way. 

Alternatively, Tegmark [10] has pointed out that each (N p x N p ) Legendre polynomial 
matrix can be factorised into the product of the corresponding (N p x (21 + 1)) spherical 
harmonic matrix and its transpose 
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for the real spherical harmonic Yi m in the direction of pixel i. Now 

^Pr = E C l B l Y l Y l (16) 

aUb leb 

and we can use the invariance of the trace of a product of matrices under cyclic permutations 
to rewrite equation JI(f) as 
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and solve 

MX l = Y l (18) 
column by column for each multipole, and 

Z iV = Y l T X l , (19) 

for each pair of multipoles, and hence each pair of bins. The last three rows of table 1 shows 
the cost of evaluating the Fisher matrix this way. 

For CMB observations we have Nf, <C N p , so that the first algorithm (Al) scales as 
O(NbNp) in size and O(NbNp) in time. Similarly Nf > Np, with approximate equality 
for all-sky maps, so that the second algorithm (A2) scales as O(N^) in size and 0(N^ N p ) 
in time. Table 2 shows the implications for a range of future experiments, scaled from 
implementations of each algorithm applied to an unbinned reduced COBE dataset. Note 
that no assumption has been made about binning in the MAP and PLANCK datasets. 



5 Conclusions 

We have implemented two algorithms using the quadratic estimator as a means of determin- 
ing the maximum likelihood power spectrum and its covariance matrix from a pixelized map 
of the CMB. Despite previous claims, whilst each is an improvement on direct evaluation 
of the likelihood function, neither scales better in time than O(Np) in the number of pixels 
in the map. Ultimately the advantage of each is in a reduction of the scaling prefactor as 
compared with direct evaluation. 

Comparing the two algorithms it is apparent that the choice of which to use for a 
particular dataset is critical — with timings differing by up to a factor of 1000. Broadly 
speaking, observations of small patches of the sky, where Ni 3> y/N p , should be analysed 
using Al, whilst all-sky maps, with iVj ~ \/N p , should be analysed using A2. 

All timings have been scaled from a small dataset analysed on a SUN Ultra II. Two 
further considerations immediately apply. 

• Moving to parallel architectures will give significant reduction in these timings. Im- 
plementation of each algorithm on the 512 processor Cray T3E at NERSC indicates 
that the improvement can be up to a factor of 1000. However, this does assume that 
we continue to keep all the necessary matrices simultaneously in core; any reduction 
to vector operations, relocation to disc, or recalculation will dramatically reduce this 
improvement. 
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• The datasets under consideration will be obtained incrementally over the next 10 
years. We should therefore take into consideration Moore's law — that computer 
power doubles every 18 months — to allow for corresponding increases in available 
memory and speed. Current trends do not, however, suggest any significant increase 
in the total parallel processor time (O(10 4 ) hours) available to us. 

Taken together, we can conclude that these algorithms, judiciously applied, will be sufficient 
to analyse 10 4 pixel datasets immediately, the 10 5 pixel datasets expected in the next 2 years 
some 6 years from now, and the 10 6 pixel datasets expected in 5 - 10 years only 16 years 
from now. However, since we would like to be able to analyse not only the actual datasets 
as soon as they are obtained, but also simulated datasets in advance of the observations, 
improved algorithms are still essential. 
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TERM 
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OPERATIONS 


X b = M~ l j£- Vb 
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Table 1: Scaling in the calculation of the Fisher matrix F for the two quadratic estimator 
algorithms Al (first two rows), A2 (last three rows). 
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SIZE 


TIME 


DATASET 


N p 


N b 


Ni 


Al 


A2 


Al 


A2 










0{N b Nl) 


0{N?) 


0(Nb Np) 


0(NfN p ) 


COBE 


10 3 


30 


30 


240 Mb 


8 Mb 


15 min 


1 min 


MAXIMA/ 


10 4 


20 


1000 


16 Gb 


8 Tb 


7 days 


20 years 


BOOMERANG 


to 10 5 


20 


1000 


1.6 Tb 


8 Tb 


20 years 


200 years 


MAP/PLANCK 


10 6 


1000 


1000 


8 Pb 


8 Tb 


1 Myears 


2 Kyears 



Table 2: Size and time costs for the calculation of the Fisher matrix F for archetypal 
datasets on a SUN Ultra II for the two quadratic estimator algorithms Al, A2. 
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